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Abstract. We present a new CSP- and SAT-based approach for coordi¬ 
nating interfaces of distributed stream-connected components provided 
as closed-source services. The Kahn Process Network (KPN) is taken 
as a formal model of computation and a Message Definition Language 
(MDL) is introduced to describe the format of messages communicated 
between the processes. MDL links input and output interfaces of a node 
to support flow inheritance and contextualisation. Since interfaces can 
also be linked by the existence of a data channel between them, the match 
is generally not only partial but also substantially nonlocal. The KPN 
communication graph thus becomes a graph of interlocked constraints 
to be satisfied by specific instances of the variables. We present an algo¬ 
rithm that solves the CSP by iterative approximation while generating 
an adjunct Boolean SAT problem on the way. We developed a solver in 
OCaml as well as tools that analyse the source code of KPN vertices to 
derive MDL terms and automatically modify the code by propagating 
type definitions back to the vertices after the CSP has been solved. Tech¬ 
niques and approaches are illustrated on a KPN implementing an image 
processing algorithm as a running example. 

Keywords: coordination programming, component programming, Kahn Pro¬ 
cess Networks, interface coordination, constraint satisfaction, satisfiability 

1 Introduction 

The software intensive systems have reached unprecedented scale by every mea¬ 
sure: number of lines of code; number of people involved in the development; 
number of dependencies between software components, and amount of data 
stored and manipulated PQ. Many of them include heterogeneous elements, which 
come from variety of different sources: parts of them are written in different lan¬ 
guages and tuned for different hardware/software platforms. Furthermore, when 
the software is developed and modified by dispersed teams, inconsistencies in 
the design, implementation and usage are unavoidable. This leads to clashes of 
assumptions about operation cost, resource availability and algorithm processing 
rate. Last but not least, parts of the system are constantly changing. Many ele¬ 
ments need to be replaced without negative effects on performance or behaviour 
of the rest of the system. 


One way to attack the software challenge is to suggest a component-based 
design: a program is designed as a set of components, each represented by an 
interface that specifies how they can be used in an application, and one or more 
implementations which define their actual behaviour. When a designer of the 
application uses a component, they agree to rely only on the interface specifica¬ 
tion. Similarly, a developer who creates an implementation for a component is 
unaware of the context where the component will be used. An algorithm that 
specifies the behaviour depends solely on self-contained input and its result is 
produced in the form of a message without a specific destination address. 

The process network, introduced by G. Kahn (KPNs) [2], is a collection 
of stream-connected algorithmic building blocks, which are fully independent 
single-threaded processes that lack a global state. The execution of the net¬ 
work generally requires a supervisory coordination program that manages the 
progress of the blocks and which provides a message-communication infrastruc¬ 
ture for the streams. Since all domain-specific computations are performed by 
the sequential processes inside the blocks, programming is naturally separated 
into algorithm and concurrency engineering [3-. The coordination language is 
responsible for component orchestration, namely 1) dynamic load control and 
adaptivity for a changing environment; 2) access control to shared resources; and 
3) communication safety between components. This paper focuses on the last as¬ 
pect. Component-based design requires an implementation of a single component 
to be independent from the rest of the network. It raises a number of software 
engineering issues: components’ interfaces are required to be specific enough so 
that components are aware of data structures communicated between them and, 
at the same time, generic enough to facilitate decontextualisation and software 
reuse. 

In this paper we present a solution to the interface reconciliation problem 
for an interface definition language specifically designed for KPNs. We demon¬ 
strate a static mechanism (based on solving Constraint Satisfaction Problem 
(CSP) and SAT) that checks compatibility of component interfaces connected in 
a network with support of overloading and structural subtyping. This allows one 
to design completely decontextualised components, so that they may be reused 
in different contexts without changing the code. This is especially important 
when the components are provided as a compiled library and its source is ei¬ 
ther private or unavailable. The components are compatible with a potentially 
unlimited number of input/output data formats coming from the environment. 
We also introduce a flow inheritance [4] mechanism: put simply, a message sent 
from one component to another may also be required to contain additional data 
which, although not needed by the recipient itself, is nevertheless required by a 
component that the recipient sends its own messages to (Fig. 0. 

We propose a Message Definition Language (MDL) that enables components’ 
generic interfaces as well as subtyping and flow inheritance; we then recast the 
interface reconciliation problem as a CSP for the interface variables and propose 
an original solution algorithm that solves it by iterative approximation while 
generating an adjunct Boolean SAT problem on the way. 


We designed and implemented a communication protocol for components 
coded in C++ to demonstrate the capabilities of MDL. We developed tools that 
1) automatically derive MDL interfaces from the source code; 2) generate a set of 
constraints given a netlist@ that describes the topology of the network; 3) solve 
the CSP; and 4) based on the solution of the CSP generate compilable code for 
every component with some API provided for run-time support. 

The process is similar to template specialisation in C++, however, in our case, 
constraints that are produced by a pair of vertices may affect the whole network, 
and, consequently, a global constraint satisfaction procedure is required. In this 
paper we provide a formal description of MDL, the CSP definition and the 
algorithm designed to solve the CSP. 

Throughout the paper we demonstrate the utility of the proposed approach 
on a practical example: an image segmentation algorithm based on k-means 
clustering (Fig. [2]). 

Related work. Linda [5] is the first language designed to separate compu¬ 
tation and coordination models. It is based on a simple tuplespace model. One 
of the disadvantages of the model is that the knowledge about the communi¬ 
cation protocol is required while implementing the processes. The problem of 
separation of concerns has not been solved in Concurrent Collections from In¬ 
tel [Bj (Linda’s successor). Therefore, generic components, which may be reused 
in multiple contexts without being modified, are not supported in the language. 

In the programming language Reo [7] components are communicating through 
hierarchical connectors that coordinate their activities and manipulate message 
dataflow. Similar to our approach, a constraint satisfaction engine, which finds a 
solution that specifies a valid interaction between components, is implemented. 
S. Kemper describes a SAT-based verification of Timed Constraint Automata 
that is used for coordination of communicating components [8] as well as in 
Reo. However, the research mostly focuses on the design of reusable interaction 
protocols and lacks the description of reusable component interfaces. 

In previous years there were attempts to design efficient programming lan¬ 
guages and run-time systems for parallel programming based on KPN how¬ 
ever, the interface reconciliation problem stemming from nonlocal inheritance in 
KPNs has not been given enough attention. 

2 Motivation 

Kahn Process Networks is a concurrency model that introduces data streams 
in the form of sequential channels that connect independent processes into a 
network. Decontextualisation of processes is an advantage of the model. Since 
processes do not share any data, a process’s conformity with the context is 
defined by its interface, which describes the kinds of message that the process 

1 for the avoidance of doubt we state that the term “protocol” is used here in the 
sense of ‘convention governing the structure and interpretation of messages’ and not 
in any state-transition sense 
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Fig. 1: Illustration of flow inheritance. A component A can process a value of type 
A' and return a value of type Y as a result. However, an input message contains 
not only an element of type A, but also an element of type Z. The latter can be 
processed by a component B. Flow inheritance provides a mechanism for partial 
message processing in a pipelined fashion. 

can send and receive. Our goal is to provide a means of interface coordination 
that supports genericity, i.e. the ability of an interface to function correctly in a 
variety of contexts. 

The distributed components are commonly provided as closed-source ser¬ 
vices. Each service contains multiple processing functions compatible with a 
variety of contexts. The input interface of a component is specialised based on 
the message format that the message producer is capable to produce, and, sym¬ 
metrically, the output interface is specialised based on the consumer’s require¬ 
ments. For example, one can define a component that contains two functions 
with type signatures Int -> Int and Int -> String. The functions implement 
algorithms that compute different values given the same input. The task is to 
statically choose the algorithm based the consumer’s requirements. This also 
demonstrates that input and output types in the interfaces of the KPN compo¬ 
nents are treated in the same manner. 

The latter makes services fundamentally different from functions. In func¬ 
tional languages, a type signature that corresponds to the interface of the com¬ 
ponent in the example can be defined by the intersection type Int -> Int A 
Int -> String, which is unsound due to its ambiguity. In functional languages, 
the return type of a function depends solely on input argument types. In con¬ 
trast, the interfaces of the KPN components form context-dependent relations 
that offer a selection of output types to a consumer. This makes the typing 
decisions essentially nonlocal and genuinely multiple. 

The problem being solved can be seen as a type inference problem; however, 
it cannot be solved using conventional type inference mechanisms based on first- 
order unification due to the presence of polymorphic output types and potential 
cyclic dependencies in the network (the example in Fig. [2] contains a back edge). 

A common communication pattern in streaming networks is a pipeline, where 
a message travels along a chain of components that work on its content. The 
component can accept a subtype of the input type, but part of the message may 
be bypassed to another component down the pipeline if the message contains 
the data the further component will need to use (Fig. [TJ. Two modes of flow 
inheritance are envisaged, considered next. 

Flow inheritance for records. The fundamental type of a message in a 
variety of systems is record, which is a collection of label-value pairs. Each compo¬ 
nent processes only a specific set of pairs, however the pairs that the component 
does not require may be bypassed to the output, so they can be processed in the 
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Fig. 2: An image segmentation algorithm based on k-means clustering that is 
implemented as a Kahn Process Network 

next stages of the pipeline if they are required. For example, a message that rep¬ 
resents a geometric shape and has a type {x: float, y: float, radius: float} 
may be processed in two steps: the first component processes the position of the 
shape as defined by the pairs x and y, and the second one needs the pair labelled 
radius. 

Flow inheritance for variants. OOP extensively uses overloading m to 
improve modularity and reusability of code. Similarly, support of polymorphic 
components in KPNs facilitates their reuse in different contexts. At the top level 
we see a component’s interface as a collection of alternative label- record pairs, 
called variants, where the label corresponds to the particular implementation 
that can process the message defined by the given record, e.g. 

(: cart: {x:float, y:float}, polar: {r:float, phi:float} :). 

Here the colonised parentheses delimit the collection of variants and each variant 
is associated with a record written as a set of label-value pairs. Any message that 
does not belong to one of the accepted variants must cause an error unless there 
exists another component further down in the pipeline that can process it. In 
this case, the message should be bypassed to the recipient. 

In streaming networks flow inheritance alleviates the problem and makes 
configuration of individual interfaces independent from each other. In our work 
we developed a solution for the interface reconciliation based on the CSP with 
support of flow inheritance for records and variants. Our mechanism statically 
detects implementation variants in components that are not required in the 
context, which is important for applications running in the Cloud where a user 
is charged proportionally to the amount of resources their application uses. 

Example. As a running example we use our implementation mi of an image 
segmentation algorithm based on k-means clustering m■ The applications’s 
KPN graph is shown in Fig. [2] The network represents a pipeline composed of 
three components: 


— The component read opens an image file using an input message M r ± with 
the file name, and sends it to the first output channel. The component con¬ 
tains 3 functions that overload component’s behaviour (i.e. the input inter¬ 
face of the component is defined by 3 variants): 1) V r i loads the colour image 
































in RGB format; 2) V r 2 loads the greyscale image as an intensity one; and 
3) V r 3 loads the image as it is stored in the file. 

— The component init sets initial parameters for the k-means algorithm. The 
component contains one processing function Vn. The input message can 
either come from the component read or from the environment with an 
input message Mu if it has been opened and preprocessed before. The input 
message must contain the number of clusters K and the image itself. 

— The kMeans component represents an iterative implementation (defined as a 
function Vki) of the k-means algorithm. The result of each iteration is sent 
to the first output channel, which is circuited back to the input channel of 
the component itself. This kind of design gives an opportunity to manage 
system load in the run-time and execute the next iteration only when suf¬ 
ficient resources are available. Once the cluster centres have converged, the 
algorithm yields the result to the second output channel. 

Using flow inheritance for variants Mu is routed directly to the init compo¬ 
nent bypassing a component read. Using flow inheritance for records a parameter 
K that is contained in M r i is implicitly bypassed through read to init. 

The interface reconciliation algorithm is capable of finding out that V r 2 and 
V r 3 are not used with the provided input, and functions containing the imple¬ 
mentations will be excluded from the generated code. 

3 Message Definition Language 

Now we define the Message Definition Language (MDL) that describes compo¬ 
nent interfaces. Each component has its associated input and output interface 
terms. A message is a collection of data entities, defined by a corresponding col¬ 
lection of terms that can contain term variables, Boolean variables and Boolean 
expressions. 

Each term is either atomic or a collection in its own right. Atomic terms are 
symbols, which are identifiers used to represent standard C++ types, such as int 
or string. To account for subtyping (including the kinds that are not present in 
C++) we include three categories of collections (see Fig. [3J: tuples that demand 
exact match and thus admit no structural subtyping, records that are subtyped 
covariantly (a larger record is a subtype) and choices that are contravariantly 
subtyped using set inclusion (a smaller choice is a subtype). The intention of 
these terms is to represent 

1. extensible data records Bam, where additional named fields can be intro¬ 
duced without breaking the match between the producer and the consumer 
and where fields can also be inherited from input to output records by low¬ 
ering the output type, which is always safe; 

2. data-record variants, where generally more variants can be accepted by the 
consumer than the producer is aware of, and where such additional variants 
can be inherited from the output back to the input of the producer — hence 
contravariance — again by raising the input type, which is always safe also. 




Term variables correspond to four categories of terms. However, for the cor¬ 
rectness of the algorithm it is important to distinguish variables that represent 
choices from variables that represent other term categories (due to two kinds of 
subtyping defined by the seniority relation in Definition [1]). We use an up-coerced 
term variable, e.g. fa, to represent a choice term and a down-coerced term vari¬ 
able, e.g. j.a, to represent any other term, i.e. a symbol, a tuple or a record. 
Formally, 

{term variable) ::= ^identifier | .(.identifier 

We use symbol [] instead of f or 4- symbols in the context where the sort is 
unimportant, e.g. []a is a term variable that can be either up-coerced or down- 
coerced. 

For brevity, term variables are called variables , Boolean variables are called 
flags and Boolean expressions are called guards. The following grammar specifies 
the guards: 

{bool) ::= {{bool) A {bool)) \ {{bool) V {bool)) \ ~^{bool) | true | false | flag 

MDL terms are built recursively using the constructors: tuple, record, choice 
and switch, according to the following grammar: 

{term) ::= {symbol) \ {term variable) \ {tuple) \ {record) \ {choice) \ {switch) 
{tuple) ::= {{term) [{term)]*) 

{record) ::= {[{label) {{bool)) : {term) [.{label) {{bool)): {term)]* [ I (.identifier ]]} 
{choice) ::= {: [{label) {{bool)) : {term)[, {label) {{bool)) : {term)]*[ I (identifier ]]:) 
{label) ::= {symbol) 

Informally, a tuple is an ordered collection of terms and a record is an exten¬ 
sible, unordered collection of guarded labeled terms, where labels are arbitrary 
symbols, which are unique within a single record. A choice is a collection of 
alternative terms. The syntax of choice is the same as that of record except for 
the delimiters. The difference between records and choices is in subtyping and 
will become clear below when we define seniority on terms. We use choices to 
represent polymorphic messages and component interfaces. 

Records and choices are defined in tail form. The tail is denoted by a variable 
that represents a term of the same kind as the construct in which it occurs. For 
example, in the term {Zi(true): ti, ..., Z„(true): t n |(,v} the variable {.v represents 
the tail of the record, i.e. its members with labels f : li ^ Zi,... li ^ l n . A switch 
is a set of unlabeled (by contrast to a choice) guarded alternatives. 

{switch) ::= <{bool): {term)], {bool): {term)]*> 

Exactly one guard must be true for any valid switch. The switch is substitu- 
tionally equivalent to the term marked by the true guard: 

(false: t\, ..., true: £*,..., false: t n ) = (true: tf) = ti. 

The switch is an auxiliary construct intended for building conditional terms. 
For example, {a: int , —>a: string) represents the symbol int if a = true, and the 
symbol string otherwise. 


For each term t, we use V 1 2 3 '(t) to denote the set of up-coerced term variables 
that occur in t, V'Ht) to denote the set of the down-coerced ones, and T(t) to 
denote the set of flags. 

A term t is called semi-ground if it does not contain variables, i.e. V^(f) U 
V^f) = 0. A term t is called ground if it is semi-ground and does not contain 
flags, i.e. V^(i) U V^(t) U F(t) = 0. 

A term t. is well-formed if it is ground and one of the following holds: 

1. t is a symbol. 

2. n > 0 and t is a tuple (ti ... t n ) where all ti, 0 < i < n, are well-formed. 

3. n > 0 and t is a record {h(bi): t \,..., l n (b n ): t n } where for all 0 < i ^ j < n, 
bi A bj —> Li ^ lj and all t,; for which 6,; are true are well-formed. 

4. n > 0 and t is a choice t \,,.., l n ib n ): t n -) where for all 0 <i^j <n, 

bi A bj —> li lj and all ti for which bi are true are well-formed. 

5. n > 0 and t is a switch {b±: t \,..., b n : t n ) where for some 1 < i < n, bi = true 
and ti is well-formed and where bj = false for all j ^ i. 

If an element of a record, choice or switch has a guard that is equal to false, 
then the element can be omitted, e.g. 

{h(bi): t±, ^2 (false): £ 2 , ^3 (b n )'- ^ 3 } = {^ 1 (^ 1 ): fi, h(b 3 ): t^} . 

If an element of a record or a choice has a guard that is true, the guard can be 
syntactically omitted, e.g. 

{h(bi): ti, Z2(true): t2, h(b n )'- ^3} = {h(bi)- ^ 2 , h(b n )'- ^3} • 


We define the canonical form of a well-formed collection as a representation that 
does not include false guards, and we omit true guards anyway. The canonical 
form of a switch is its (only) term with a true guard, hence any term in canonical 
form is switch-free. 

Next we introduce a seniority relation on terms for the purpose of structural 
sub typing. In the sequel we use nil to denote the empty record { }, which has 
the meaning of void type in C++ and represents a message without any data. 
Similarly, we use none to denote the empty choice (: :). 

Definition 1 (Seniority relation). The seniority relation C on well-formed 
terms is defined in canonical form as follows: 

1. none Qt if t is a choice. 

2. t C nil if t is any term but a choice. 

3. tQt. 

4 • 1 1 C f 2 ; if for some k,m > 0 one of the following holds: 

(a) ti = (t\ ... ), t 2 = (t\ ■ ■ ■ t%) and t\ C t\ for each 1 < i < k; 

(b) <1 = { l \: t\,... ,li'. and t^ = {^ : t\,... ,Uff: f™}, where k > m and 
for each j < m there is i < k such that l\ = l J 2 and t\ C t 2 ; 

(c) ti = (:/(: l\: t\:) and t .2 = ( :l\ : t\I™: tiff:), where k < m and 

for each i < k there is j < m such that l\ = l 2 and t\ C t 2 ; 
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Fig. 3: Two semilattices representing the seniority relation for terms of different 
categories. The lower terms are the subtypes of the upper ones. 

Given the relation t C t', we say that t' is senior to t and t is junior to t'. 

Proposition 1. The seniority relation C is trivially a partial order, and (T, C) 
is a pair of upper and lower semilattices (Fig.\3j. 

The seniority relation represents the subtyping relation on terms. If a term 
t' describes the input interface of a component, then the component can process 
any message described by a term t, such that t C 1'. 

Although the seniority relation is straightforwardly defined for ground terms, 
terms that are present in the interfaces of components can contain variables and 
flags. Finding such ground term values for the variables and such Boolean values 
for the flags that the seniority relation holds represents a CSP problem, which 
is formally introduced next. 

4 Constraint Satisfaction Problem for KPN 

In this section we define a Constraint Satisfaction Problem for Kahn Process 
Networks (CSP-KPN). We regard a KPN network as a directed weakly con¬ 
nected labeled graph G = (V, E, L), where 

1. V is a set of vertices. The vertices correspond to individual Kahn processes. 

2. E is a set of edges, where each edge e £ E is an ordered pair of vertices (v, v'), 
v,v' £ V. The edges correspond to channels connecting Kahn processes. 

3. A function L: E —>• Term x Term assigns a labeH to each edge e £ E which 
represents a pair of MDL terms L(e) = t C t' called a constrain 10- It de¬ 
fines the input requirements and the output properties associated with the 
channel. 

Given a graph G = (V, E, L) we define the set of constraints as 

C(G) = (J L(e), 

e£E 

3 we use the same word “label” to refer to the mark on a graph edge and the symbol 
that labels a term in a record or a choice; however our intention is always clear from 
the context. 

4 in the rest of the paper symbol C denotes the seniority relation for a pair ground 
terms; alternatively, if the terms are not ground, C specifies a constraint. 








the sets of up-coerced term variables V 1 '(G) and down-coerced term variables as 
V-HG) 

V f (G)= U V t (i)UV t (t') and V i (G) = [J V j (t) U 

tct'eC(G) tct'eC(G) 

and the set of flags .F(G) as 

^(G)= U 

tct'eC(G) 

Assume a vector of flags / = {fi, ■ ■ ■, fi), a vector of term variables []# = 
(Qui,..., a vector of Boolean values b= (b±,... ,bi) and a vector of terms 

a = (s i,..., s m ). Then for each term t 

1. t[f/b] denotes the vector obtained as a result of the simultaneous replacement 
of fi with bi for each 1 < j < i, 

2. t[[]u/s] denotes the vector obtained as a result of the simultaneous replace¬ 
ment of []uj with Si for each 1 < i < m; 

3. t[f/b, Qu/s] is a shortcut for t[f/b\[^v/s\. 

Assume a KPN graph G = (V, E. L) such that |.F(G)| = /, |V 1 '(G)| = to, 
|VHG)| = n and for some l,m,n> 0. 

Definition 2 (CSP-KPN). We define a CSP for a KPN graph G (CSP-KPN) 
as follows: for each t Qt 1 £ C(G) find a vector of Boolean values b = (bi ,..., bi), 
a vectors of ground terms t = (ti,..., t m ), t' = (ij,..., t' n ) such that 

t[f/b,tv/t,iv/^] C t'[f/b,tv/t,lv/i > ], 

where f = (/i ,.fv = CN. • ■ ■, fv m ), fv = (4-«i,... ,|u n )- The tuple 
(6, t, t?) is called a solution. 

A CSP-KPN is decidable since the message definition language we intro¬ 
duced can be seen as a term algebra, and decision problems for term algebras 
are decidable m- 

5 Adjunct SAT 

The CSP-KPN solution algorithm presented in the next section is iterative and 
takes advantage of the order-theoretical structure of the MDL (Proposition [TJ. 

Let Bo C Bi C • ■ • C B s be sets of Boolean constraints, and afi and afi be 
vectors of semiground terms such that | = |V 1 '(G)| and |er 1 j = |lA(G)|. The 
vectors af and afi are conditional approximations of the solution. 

We seek the solution as a fixed point of a series of approximations in the 
following form: 

(B 0 ,aJ, Oq)> • ■ • > (B s -i, 4~i wLi), (B s ,4, at), 


Algorithm 1 CSP-KPN(G) 

1: c 4- |C(G)| 

2: i •<— 0 
3: Bo <- 0 

4: aj (none,..., none) 

5: clq «— (nil,..., nil) 

6: repeat 

7: for 1 < j < c : tj C t'- € C(G) do 

8: (Bi. c _)_j, Oi.o+j-, fflt-c+j) SOLVE(Bi. c +j_i, flt-c+j-ii a t-c+j-i! true, tj, tj) 

9: end for 

10: i <— i + 1 

11: until (SAT(Bi. c ), aj, c , aj, c ) = (SAT(B (i _ 1) . c ), 

12: if Bi. c is unsatisfiable then 
13: return Unsat 

14: else 

15: return (SATSol(Bi. c ), al c [f/b\, a$. c [f/b]) 

16: end if 

where for every 1 < k < s and a vector of Boolean values b that is a solution to 
SAT(Bfc) (by SATfBfr) we mean a set of Boolean vector satisfying Bfc): 

al_i[f/b] C al[f/b] and a x k [f/b] E a^f/b], (1) 

where the elements of the vectors are compared pairwise. The starting point is 
Bo = 0, aj = (none,none), = (nil,..., nil) and the series terminates as 
soon as SAT(B s ) = SAT(B s _i), = aj_ 1; = a\_ x . 

The adjunct set of Boolean constraints potentially expands at every iteration 
of the algorithm by inclusion of further logic formulas called assertions into its 
conjunction as the algorithm processes constraints C(G). Whether the set of 
Boolean constraints actually expands or not can be determined by checking the 
satisfiability of SAT(Bfc) ^ SAT(Bfc_i) for the current iteration k. 

We argue below that if the original CSP-KPN is satisfiable then so is 
SAT(B„) and that the tuple of vectors (b s , a[[f/b s ], a\[f/b s \) is a solution to 
the former, where b s is a solution of SAT(B s ). In other words, the iterations 
terminate when the conditional approximation limits the term variables, and 
when the adjunct SAT constrains the flags enough to ensure the satisfaction of 
all CSP-KPN constraints. In general, the set SAT(B s ) can have more than one 
solution. We select one of them, denoted by SATSol(B s ) in the algorithm. 


6 Algorithm 

In this section we present Algorithm |T| which solves CSP-KPN for a given KPN 
graph G = (V, E, L). It performs the following steps. 

The algorithm iterates over the set of constraints C(G) and at each step it 
builds a closer approximation of the solution. The relation between two conse¬ 
quent approximations satisfies formulas Q. 





The function SolveQ solves the constraint tj E t' (see equation ([2]) in 
Lemma [T]) and updates the vectors aJ. c+J - and «tc+j with new values. Further¬ 
more, it adds Boolean assertions presented below that ensure 1) satisfaction 
of the constraint for any b G SAT(B,. C+J ) as provided by Definition [1] and 2) 
well-formedness of the terms occurring in it. 

The algorithm terminates if B;. c = aj. c = , c and aj. c = c . 

Well-formedness assertions for records and choices. Any pair of elements 
in a well-formed record/choice cannot have equal labels. Therefore, for each 
record {7i(&i): t \,..., li(b n ): t n } and each choice (:Zi(&i): ti,..., l\(b n ): t n :) oc¬ 
curring anywhere in C(G) the following assertion is added to the SAT: 

A - , (&i A bj). 

li=lj 

Well-formedness assertions for switches. A well-formed switch term must 
have exactly one positive guard. Hence, for each switch (hi: t \,..., b n : t n ) oc¬ 
curring anywhere in C(G) the following assertion is added to the SAT: 

(£>i V ■ • • V b n ) A A ~'(bi A bj). 

Order assertions. We generate two kinds of order assertions. 

1. If a variable Qx is junior to two incommensurable, identically guarded terms 
[]x E (...: b : t\ ...) and []x E (...: b : t 2 .. .), where neither t\ E t 2 nor 
t 2 EH, the assertion ->6 is added to the adjunct SAT. 

2. For each c £ C(G) of the form (b\\ t \,..., b n : t n ) E (b[. t [,..., b’ m : t ' m ), the 
assertion ->(bi A &'•) is added to the adjunct SAT if ti % t'. 

Further details are found in Appendix [A] 

Lemma 1 (Loop invariant). Algorithm^ finds a series of approximations in 
the form of 

( B *o > 4 0 > 4 0 )> • • • > ( B fc»-1»«!_!»)- (B fca , a\ s , 4 a ); 

where hi = i ■ |C(G)|, and such that the following holds for any b ki £ SAT(Bj £i ). 


1. 

B ki 




2. 

-4 

a L 

, [f/bki] E at[f/b ki ] and ai[f/b 

fcj E 

3. 

$ («',«")■• 


1 e [//&**]/ 


(a) 

E a'lf/bk 4 

]. Af/K 


(b) 


a"[f/b ki ] 

E [f/bki]- 

Proof. 

Let c = C(G) . To construct (B^, 

s iven 


algorithm iteratively calls SolveQ function (see Appendix [All. 


(B r ,d(t,a):) SOLVE(B r _i,dAi,«r-iA r ue,H,t') 


where r = ki-i + j , 1 < j < c and ki = fcj_i + c. For any tj C f '•. Solve() con¬ 
structs the Boolean constraints B r , and finds aj and a): by solving the equation 

tif/br-utv/al^lv/a^] = t'[f/b r -i,fv/al,lv/a i r _ 1 \, (2) 

1. D by construction: Solve() only adds new Boolean constraints to 

the existing set. 

2. SolveQ iteratively constructs the local approximation for each constraint. 
The series of local approximations converges to the global approximation. 

3. Proof by contradiction. Assume that (aj, cqt) is a solution of © and there 
exists another solution (a', a"), such that a' ^ a[ and a" ^ a\.. Then 

t[f/b r -i,tv/al_ v lv/a"] = t'lf/br-xA^/a'Av/a].^]. 

Two ground terms are equal only if they represent the same term, and, 
therefore, a' = of. and a" = af, which contradicts the initial assumption. □ 

Theorem 1 (Termination). CSP-KPN(G) terminates after a finite number 
of steps for any KPN graph G. 

Proof. For a given graph G the number of flags, variables and labels for records 
and choices is bounded. There are two ways to produce new terms: either to 
add entries with new labels to records and choices, or to substitute subterms for 
terms. 

1. The number of new terms constructed by adding new entries is bounded 
because the number of labels in a given G is finite. 

2. The number of terms constructed by substituting subterms for other terms 
is bounded because a) the number of variables is finite (the algorithm does 
not generate new variables); b) after the variables have been instantiated, 
the category of the term cannot be changed, otherwise, the seniority relation 
would be violated. 

It implies that for each 'f'c € V”^(G) there exists a ground term t, such that ■ft; C t, 
and for each \.v £ V^(G) there exists a ground term t, such that t C \v. Providing 
that |SAT(B / r. i )| < |SAT(Bfc i _ 1 )|, the algorithm terminates after a finite number 
of steps. □ 

Theorem 2. Assume a KPN graph G = (V, E, L). The set of constraints C(G) 
is inconsistent if and only */CSP-KPN(G) returns Unsat. 

Proof. As the initial approximation the algorithm selects the weakest approxi¬ 
mation (0, (none,..., none), (nil,..., nil)), it follows from Lemma |T| that the al¬ 
gorithm iterates over all possible approximations in consecutive order starting 
from (Bfc 0 , a\. , ajj o ). Therefore, the algorithm cannot skip a solution if one exists. 
By Theorem [1] the algorithm terminates after a finite number of steps. Hence, it 
returns Unsat only if and only if the set of constraints C(G) cannot be satisfied. 

□ 


message _l_init(vector<vector<double> img); 
message _2_error(string msg); 

variant _l_read_color(string fname) { _l_init(...); ... _2_error(...); } 
variant _l_read_grayscale(string fname) {...} 
variant _l_read_unchanged(string fname) {...} 


(a) The source code 

IN 

1: (: read_color(c): {fname: string I $_rc}, 

read_grayscale(g): {fname: string I $_rgl, 
read_unchanged(u): {fname: string I $_ru} I $“r :) 

OUT 

1: (: init(or c g u) : {img: vector<vector<double», I $_rol } 
2: (: error(or c g u): {msg: string I $_ro2 } :) 

$_rc <= $_rol; $_rg <= $_rol; $_ru <= $_rol; 

$_rc <= $_ro2; $_rg <= $_ro2; $_ru <= $_ro2; 


$-r :) 


(b) The interface 

Fig. 4: The source code and the interface of the component read of the image 
processing algorithm 

7 Communication Protocol 


In this section we demonstrate interfaces with flow inheritance and code customi¬ 
sation using the example from Section [2] The interfaces are defined as choice- 
of-records terms. Labels in the choice term of the input interface correspond 
to function names that can process messages tagged with corresponding labels. 
Output messages are produced by calling special functions called salvos. The 
name of a salvo corresponds to one of the labels in the output choice term. 
The compatibility of two components connected by a channel is defined by the 
seniority relation. 

Consider the source code and the interface of the component read in Fig. [4] 
Integers that have been added as prefixes to functions in the code specify the 
channels that messages are received from and sent to. In the interfaces we use 
prefixes and $_ before identifiers to denote up- and down-coerced variables, 
respectively. 

A tail variable $~r in the interface enables flow inheritance for choices: vari¬ 
ants from the input channel that cannot be processed by the component (i.e. all 
variants but read_color, read_grayscale or read_unchanged), are absorbed 
by $~r. Thus, the messages of type M r \ that contain the name of the image file 
are processed by the component and the messages of type Mu are inherited to 
the output and forwarded directly to the component init. 

Flow inheritance for records is realised by down-coerced variables $_rc, $_rg, 
$_ru, $_rol, $_ro2, and a set of auxiliary constraints. A record in the input 
message contains an entry with the label K, which a processing function does 
not expect. After solving the CSP, the entry is added to the tail variable $_rol, 


because the solver deduces that the element with the label K is required by the 
component init. 

Furthermore, we use flags c, g and u to exclude the code that is not used in 
the context. The guards in the output interface employ the joint set of flags from 
the input variants that can fire salvos specified in the output interface. In the ex¬ 
ample all three functions can fire init and error salvos; accordingly, the salvos’ 
guards are c V g V u. The solver deduces that that the variants read_grayscale 
and read_unchanged cannot receive any messages, and, therefore, their respec¬ 
tive processing functions can be excluded from the code. 

To facilitate decontextualisation we introduce a wrapper for every component 
called a shell: an auxiliary configuration file that provides facilities for renaming 
labels in output records and choices and changing the routing of output messages. 

The source code and the interfaces for the other two components are available 
in the repository mi. 

8 Implementation 

We implemented the solver [16 a for the CSP-KPN in OCaml. It works on top 
of the PicoSAT [T7] library, the latter used as a subsolver dealing with Boolean 
assertions. The input for the solver is a set of constraints and the output is in 
the form of assignments to flags and term variables. 

We also developed a toolchain in C++ and OCaml that performs the interface 
reconciliation in five steps: 

1. Given a set of C++ sources (the components), augment them with macros 
acting as placeholders for the code that enables flow-inheritance. 

2. Derive the interfaces from the code. 

3. Given the interfaces and a netlist that specifies a KPN graph, construct the 
constraints to be passed on to the CSP-KPN algorithm. 

4. Run the solver. 

5. Based on the solution, generate header files for every component with macro 
definitions. In addition, the tool generates the API functions to be called 
when a component sends or receives a message. 

Advantages of the presented design are the following: 

— Interfaces and the code behind them can be generic as long as they are 
sufficiently configurable. No communication between component designers is 
necessary to ensure consistency in the design. 

— Configuration and compilation of every component is separated from the rest 
of the application. This prevents source code leaks in proprietary software 
running in the Cloud@ 

5 which is otherwise a serious problem. For example, proprietary C++ libraries that use 
templates cannot be distributed in binary form due to restrictions of the language’s 
static specialisation mechanism. 



9 Conclusion and Future Work 


We have presented a new static mechanism for coordinating component inter¬ 
faces based on CSP and SAT that checks compatibility of component interfaces 
connected in a network with support of overloading and structural subtyping. 
We developed a fully decoupled Message Definition Language that can be used 
in the context of KPN for coordinating components written in any program¬ 
ming language. We defined the interface of C++ components to demonstrate the 
binding between the MDL and message processing functions. Our techniques 
support genericity, inheritance and structural subtyping, thanks to the order 
relation defined on MDL terms. 

On the theory side, we presented the CSP solution algorithm, showed its 
correctness and identified the termination condition. Although we assume that 
the algorithm is NP-complete because of the SAT problem, which needs to be 
solved as a subproblem, the complexity of the algorithm will be evaluated in 
further research. 

The next step will be to support multiple flow inheritance in the MDL, to 
enable combined structures with inheritance (for example, (union 4_u 4 -b) repre¬ 
sents a record that contains a union of entries associated with records fa and 4 , 6 ). 
This would allow one to design components that perform synchronisation and 
merge multiple messages into one while preserving the inheritance mechanism 
of a vertex. 

In the context of Cloud, our results may prove useful to the software-as- 
service community since we can support much more generic interfaces than are 
currently available without exposing the source code of proprietary software 
behind them. Building KPNs the way we do could enable service providers to 
configure a solution for a network customer based on components that they have 
at their disposal as well as those provided by other providers and the customer 
themselves, all solely on the basis of interface definitions and automatic tuning 
to nonlocal requirements. 
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A Appendix: Solve function 


Algorithm 2 Solve(B*, aj, aj, b,t, t') 

1: B; •(— ASSERTWELLFORMED(ASSERTWELLFORMED(B;, b, t) , b, t') 

2: if (( t = none or t' = nil) and ( t ^ none or t' ^ nil)) 

3: or (t and t' are ground, and t = t') then 

4: return (B;,aJ,af) 

5: else if t = fn and t' = p/ then 

6: Bj+i, aj +1 set a new approximation for p/ equal to the one of 

7: else if t = 4_u and t' = \.v' then 

8: B;+i, aj +1 •<— set a new approximation for ,|,u equal to the one of p/ 

9: return (B i+ i,oj,of +1 ) 

10: else if t is nil, symbol, tuple or record, and t' = \v' then 

11: return Solve(B<, at, at, b, t, t'\\v/aj, \v/aj]) 

12: else if t is a choice and t' = p/ then 

13: Bi+i, aj +1 set a new approximation for p/ as t\\v/a[,\.v/aj\ when b 

14: return (B i+ i,oJ +1 ,a+) 

15: else if t = 4-V, and t! is nil, symbol, tuple or record then 

16: Bj_|-i,af +1 set a new approximation for p> as t'\\v/aj, p7/af] when b 

17: return (B i+ i,af,af +1 ) 

18: else if t = p and t' is a choice then 

19: return SOLVE(Bi, aj, aj, b, t^v/aj, \,v/aj\, t') 

20: else if t and t' are tuples then 

21: return SolveTupleTuple(B;, aj , aj", b, t, t') 

22: else if t — nil and t' is a record then 

23: return SolveNilRecord(B;, aj, aj, b, t') 

24: else if t and t' are records then 

25: return SolveRecordRecord(B;, aj, aj , 6, t, t') 

26: else if t and t' are choices then 

27: return SolveChoiceChoice)!^, aj, aj, b, t, t') 

28: else if t or t' is a switch then 

29: return SOLVESwiTCH(Bi, aj, aj, b, t, t') 

30: else 

31: return (B; U {—■&}, aj, af) 

32: end if 






Algorithm 3 AsSERTWELLFORMED(Bj, b, t) 

1: if t is a record {Zi(t>i): ti,..., l n {b n ): tn} or (\li(bi): ti,..., l n (b n ): t n -) then 
2: 4— B; U {b t ~ 1 (Z>; A Z>j)} 

: li=lj 

3: else if t is a switch (bi: ti,... ,b n ' t n ) then 

4: Bj+i i — Bi U { b\ V • • • V 6 n } [J {b —> ~ '(bi A bj')} 

5: end if 
6: return B;+i 


Algorithm 4 SoLVETuPLETuPLE(Bj, aj, a\, b, t, t') 

1: Let t be of the form (fi... t n ) 

2: g <- t’Yfv/a\,lv/a\] 

3: if g = (ti ... t' m ) and n = m then 
4: for i : 1 < j < n do 

5: B i+j,af + j,af + j <— SOLVE(Bi_|_j_i, al + j_ 1 , ot+i-n b, tj, tj) 

6: end for 

7: return (B i+ „,aJ+„, 4+J 

8 : else 

9: return (B; U {-.&}, a[ +n , aj+J 

10: end if 


Algorithm 5 SolveNilRecord(B, ; , a?, a \, b, t') 

1: 5 t- t'[tv/aJ,i.w/4] 

2: Let g be of the form (Zi(Z>i): ti,..., l' m (b' m ): t' m } 

n 

3: return (B, U {b —>• /\ -A'}, aj, a+) 
f=i 


Algorithm 6 SolveRecordRecord(B,, aj, a\, b, t, t') 

1: g t'[tv/aj,iv/aj\ 

2: if t = (Zi(6i): ti, ..., ln{b„): t n } then 

3: for j: 1 < j < m do 

4: if 3k: Ik £ t, Ik = l'j then 

5: Bi-|-j, itf+j ^ Solvk(B 1+ ,_i. _i ■ — ] - b t bj > bk , L:, l.j) 

6: else 

7: Bi+j •<— B;_|_j_i U {b —> _, 6j} 

8: end if 

9: end for 

10: else if t = {h(bi): fi,..., l n (b n ): t n |4>v} then 

11: for j: 1 < j < m do 

12: if 3fc: Zfc £ t, Ik = Zj then 

13: Bi+j,^ SOLVE(Bj_|_j_i,Z) > bj > bk,tk,tj) 

14: else 

15: B; + j ,aJ +J ,a^ +J set a new approximation for 4-ti as {Z'(Z)'): t'} when b 

16: end if 

17: end for 

18: end if 

19: return (B i+m,a[ +m ,a\ +m ) 















Algorithm 7 SolveChoiceChoice(B ; , a], aj, b, f, t') 

1: g ^t[^v/a[,lv/a\] 

2: if f' = (:h(b i): ti, ..., ln(b n )- t n :) then 

3: for j: 1 < j < m do 

4: if 3 k: Ik £ t, I'k = lj then 

5: Bj-j-j, a.j, o-i+j SoLVE(Bj_|_ ;; '_i, ^ — t bj bk,tj,tk) 

6: else 

7: <— Bi+j-i U {6 —> 'foj} 

8: end if 

9: end for 

10: else if t' = (:li(bi): t i, ..., l n {b n )- t n |t^0 then 

11: for j : 1 < j < rn do 

12: if 3fc: Ik £ t, Ik = lj then 

13: aj+j, ^ SOI.vk(B,+j- 1 ■ i • a i—j- 1 ■ ^ > lj ■> Ik) 

14: else 

15: B ; + j , aj + •, at + • -4— set a new approximation for tr as ( :lj(bj ): tj:) when b 

16: end if 

17: end for 

18: end if 

19: return (B i+m , a[ +m , a\ +m ) 


Algorithm 8 SolveSwitch(B j; , a}, aj, b, t, t') 

1: if t = (6i: ti,..., bn : tn) then 
2: for j: 1 < * < n do 

3: Bi+j, a i+J ., a i+J - -4— SoLVE(B<+j—i, b, tj, t ) 

4: end for 

5: else if t' = (b'i: t[,..., b' n -. t' n ) then 

6: for %: 1 < j < n do 

7: ^i+jj a f+jj a i+j SOLVE(Bj+j_i, ot+j-n b, t, tj) 

8: end for 

9: end if 

10: return {B i+n ,aj +n ,aj +n ) 









